natural language processing
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
- Europe > Russia (0.04)
- (6 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (13 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (6 more...)
Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis
Watanabe, Chihiro, Sun, Jingyu
Large language models (LLMs) are increasingly used as agents to solve complex tasks such as question answering (QA), scientific debate, and software development. A standard evaluation procedure aggregates multiple responses from LLM agents into a single final answer, often via majority voting, and compares it against reference answers. However, this process can obscure the quality and distributional characteristics of the original responses. In this paper, we propose a novel evaluation framework based on the empirical cumulative distribution function (ECDF) of cosine similarities between generated responses and reference answers. This enables a more nuanced assessment of response quality beyond exact match metrics. To analyze the response distributions across different agent configurations, we further introduce a clustering method for ECDFs using their distances and the $k$-medoids algorithm. Our experiments on a QA dataset demonstrate that ECDFs can distinguish between agent settings with similar final accuracies but different quality distributions. The clustering analysis also reveals interpretable group structures in the responses, offering insights into the impact of temperature, persona, and question topics.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (7 more...)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (2 more...)
- Research Report (0.68)
- Overview (0.46)
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks, demanding an extreme scale of model parameters (10s or 100s of billions) and sometimes yielding suboptimal performance. In practice, it is often desirable to build more efficient models--despite being less versatile, they still apply to a substantial subset of problems, delivering on par or even superior performance with much smaller model sizes.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (19 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (10 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Indiana (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- (6 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)